A Support Tool for Deriving Domain Taxonomies from Wikipedia

نویسندگان

  • Lili Kotlerman
  • Zemer Avital
  • Ido Dagan
  • Amnon Lotan
  • Ofer Weintraub
چکیده

Organizing data into category hierarchies (taxonomies) is useful for content discovery, search, exploration and analysis. In industrial settings targeted taxonomies for specific domains are mostly created manually, typically by domain experts, which is time consuming and requires a high level of expertise. This paper presents an algorithm and an implemented interactive system for automatically generating target-domain taxonomies based on the Wikipedia Category Hierarchy. The system also enables human post-editing, facilitated by intelligent assistance.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Towards Supporting Exploratory Search over the Arabic Web Content: The Case of ArabXplore

Due to the huge amount of data published on the Web, the Web search process has become more difficult, and it is sometimes hard to get the expected results, especially when the users are less certain about their information needs. Several efforts have been proposed to support exploratory search on the web by using query expansion, faceted search, or supplementary information extracted from exte...

متن کامل

Unsupervised Knowledge Extraction for Taxonomies of Concepts from Wikipedia

A novel method for unsupervised acquisition of knowledge for taxonomies of concepts from raw Wikipedia text is presented. We assume that the concepts classified under the same node in a taxonomy are described in a comparable way in Wikipedia. The concepts in 6 taxonomies extracted from WordNet are mapped onto Wikipedia pages and the lexico-syntactic patterns describing semantic structures expre...

متن کامل

A Short Survey on Taxonomy Learning from Text Corpora: Issues, Resources and Recent Advances

A taxonomy is a semantic hierarchy, consisting of concepts linked by is-a relations. While a large number of taxonomies have been constructed from human-compiled resources (e.g., Wikipedia), learning taxonomies from text corpora has received a growing interest and is essential for longtailed and domain-specific knowledge acquisition. In this paper, we overview recent advances on taxonomy constr...

متن کامل

A Category-Driven Approach to Deriving Domain Specific Subset of Wikipedia

While many researchers attempt to build up different kinds of ontologies by means of Wikipedia, the possibility of deriving highquality domain specific subset of Wikipedia using its own category structure still remains undervalued. We prove the necessity of such processing in this paper and also propose an appropriate technique. As a result, the size of knowledge base for our text processing fr...

متن کامل

USAAR at SemEval-2016 Task 13: Hyponym Endocentricity

This paper describes our submission to the SemEval-2016 Taxonomy Extraction Evaluation (TExEval-2) Task. We examine the endocentric nature of hyponyms and propose a simple rule-based method to identify hypernyms at high precision. For the food domain, we extract lists of terms from the Wikipedia lists of lists by using the name of each list as the endocentric head and treating all terms in the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011